Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding language analyzers #8591

Merged

Conversation

AntonEliatra
Copy link
Contributor

Description

adding arabic language analyzer

Issues Resolved

Part of #1483 addressed in this PR.

Version

all

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@AntonEliatra AntonEliatra marked this pull request as ready for review November 4, 2024 12:26
@AntonEliatra AntonEliatra changed the title adding arabic language analyzer adding language analyzers Nov 4, 2024
@vagimeli vagimeli added 3 - Tech review PR: Tech review in progress 4 - Doc review PR: Doc review in progress Content gap labels Nov 4, 2024
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @AntonEliatra! Please apply my changes, and we'll get this to editorial review.

_analyzers/language-analyzers/index.md Show resolved Hide resolved
nav_order: 100
parent: Analyzers
has_children: true
has_toc: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a bad idea to have the TOC here. It will list all analyzers on the bottom under Related articles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


#### Example request

The following query specifies the `french` language analyzer for the index `my-index`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request creates a text field with a french subfield configured with the french analyzer. I think we can keep this request but correct the description for it. Also, it would be nice to provide just a plain example with setting an analyzer on the whole index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added additional example

_analyzers/language-analyzers/index.md Outdated Show resolved Hide resolved
_analyzers/language-analyzers/index.md Outdated Show resolved Hide resolved

## Stem exclusion

You can also use `stem_exclusion` with this language analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use `stem_exclusion` with this language analyzer using the following command:
You can use `stem_exclusion` with this language analyzer using the following command:


## Stem exclusion

You can also use `stem_exclusion` with this language analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use `stem_exclusion` with this language analyzer using the following command:
You can use `stem_exclusion` with this language analyzer using the following command:


## Stem exclusion

You can also use `stem_exclusion` with this language analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use `stem_exclusion` with this language analyzer using the following command:
You can use `stem_exclusion` with this language analyzer using the following command:


## Stem exclusion

You can also use `stem_exclusion` with this language analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use `stem_exclusion` with this language analyzer using the following command:
You can use `stem_exclusion` with this language analyzer using the following command:


## Arabic analyzer internals

The `arabic` analyzer is build using the following:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global: please apply these changes to all files.

@kolchfa-aws kolchfa-aws added the backport 2.18 PR: Backport label for 2.18 label Nov 6, 2024
Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: AntonEliatra <[email protected]>
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws @AntonEliatra Please see my comments and changes and let me know if you have any questions. Thanks!


## Custom Armenian analyzer

You can create custom Armenian analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Armenian analyzer using the following command:
You can create a custom Armenian analyzer using the following command:


## Custom Basque analyzer

You can create custom Basque analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Basque analyzer using the following command:
You can create a custom Basque analyzer using the following command:


## Custom Bengali analyzer

You can create custom Bengali analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Bengali analyzer using the following command:
You can create a custom Bengali analyzer using the following command:


## Custom Brazilian analyzer

You can create custom Brazilian analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Brazilian analyzer using the following command:
You can create a custom Brazilian analyzer using the following command:


## Custom Bulgarian analyzer

You can create custom Bulgarian analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Bulgarian analyzer using the following command:
You can create a custom Bulgarian analyzer using the following command:


- Tokenizer: `thai`

- Token Filters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Token Filters:
- Token filters:


## Custom Thai analyzer

You can create custom Thai analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Thai analyzer using the following command:
You can create a custom Thai analyzer using the following command:


- Tokenizer: `standard`

- Token Filters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Token Filters:
- Token filters:


## Custom Turkish analyzer

You can create custom Turkish analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Turkish analyzer using the following command:
You can create a custom Turkish analyzer using the following command:

_analyzers/language-analyzers/index.md Outdated Show resolved Hide resolved
@kolchfa-aws kolchfa-aws merged commit c29761c into opensearch-project:main Nov 14, 2024
5 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 14, 2024
* adding arabic language analyzer

Signed-off-by: Anton Rubin <[email protected]>

* Add grandparent to arabic analyzer

Signed-off-by: Fanit Kolchina <[email protected]>

* adding more details

Signed-off-by: Anton Rubin <[email protected]>

* adding armenian language analyzer

Signed-off-by: Anton Rubin <[email protected]>

* adding basque bengali and brazilian language analyzers

Signed-off-by: Anton Rubin <[email protected]>

* adding bulgarian catalan and cjk language analyzers

Signed-off-by: Anton Rubin <[email protected]>

* adding czech,danish,dutch,english,estonian,finnish,french and galician analyzer docs

Signed-off-by: Anton Rubin <[email protected]>

* adding german,greek,hindi,hungarian,indonesian,irish,italian,latvian,lithuanian,norwegian and persion laguage analyzer docs

Signed-off-by: Anton Rubin <[email protected]>

* adding portuguese,romanian,russian,sorani,spanish,swedish,thai and turkish language analyzer docs

Signed-off-by: Anton Rubin <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: AntonEliatra <[email protected]>

* updating as per pr review

Signed-off-by: Anton Rubin <[email protected]>

* fixing broken link

Signed-off-by: Anton Rubin <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: AntonEliatra <[email protected]>

* Update _analyzers/language-analyzers/index.md

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Add redirect to index page

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Anton Rubin <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: AntonEliatra <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Fanit Kolchina <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit c29761c)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Tech review PR: Tech review in progress 4 - Doc review PR: Doc review in progress backport 2.18 PR: Backport label for 2.18 Content gap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants